A georeferenced database of the edaphic biota currently available for Argentina

Abstract Background Soils have been studied and classified in terms of their physical and chemical characteristics, while the knowledge about biodiversity and the ecosystem processes that they support is lagging behind. Furthermore, the advance in scientific knowledge contributed by different researchers is dispersed and it is necessary to collect it to bring the big picture into focus. Today, it is possible to have the findings and data collected by different researchers, compile them and, based on technological advances, have tools that allow the information to be analysed in its entirety. The main objective of this work is to compile and systematise all the bibliographic information available on the main organisms that make up biodiversity in the soil: Acari, Collembola and Crassiclitellata in Argentina. This information will then allow us to link the composition and structure of the soil community with processes and flows in the ecosystem, and to estimate them at different scales and in soils with different anthropic impact. The database presented here gathers presence information on the mentioned taxa, their geographical location for the entire country, while preserving the identity and authorship of each scientific work retrieved. The taxonomic range of the organisms of the edaphic biota collected in this database ranges from class to subspecies and are registered, based on the taxonomic level reported by the original author in their research. The publications were obtained from Google Scholar, Scopus and JSTOR. In addition, records were added from INEDES theses, library searches, information requested from authors cited in other articles and unpublished works. In total, information was collected from 224 scientific publications, as well as personal information requested directly from some authors. The total number of registered individuals so far is 4838 of which 3049 specimens correspond to Acari, 944 to Classiclitellata and 845 belong to Collembola. New information This work is the first to gather, in a single publication, the entire dataset for all the Acari, Collembola and Clitellata recorded for Argentina.


Introduction
Our human activities, such as housing, health, clothing and food, are sustained by the use of natural resources.This use produces modifications in the environment that is the habitat of various biological communities and, at the same time, are an integral part of the ecosystems and the functions that occur therein.The study of biodiversity is presented as a challenge today because of the urgent need to know what types of organisms are present, where they are found and how biodiversity determines ecosystem functions, but, more importantly, because it is essential to maintain the structure of the edaphic ecosystem in the face of the different uses it is subjected to and to understand the close relationship with the edaphic processes that provide key ecosystem services and benefit human beings (Marichal et al. 2004, Phillips et al. 2019).
Integrated indices that group different indicators are used in a wide variety of disciplines because they cover complex and multidimensional concepts, synthesising a large amount of information in a simple and practical form.Currently, the evaluation of water quality shows extensive development in the use of different integrated indices, such as those used by the European Water Framework Directive (Chave 2001) or the Clean Water Act, the Federal Water Control Act, Water Pollution of the United States (ACT 2002).Constructing indices from biological data potentially constitutes a key tool to promote the care and rational sustainable use of soils.However, at present, there are no well-developed indicators for terrestrial systems (Knoepp et al. 2000) applicable at the regional level.This is why it is necessary to unify criteria, compile and synthesise existing information to achieve efficient use of resources that result in planning and correct regulation of sustainable land use.This systematised information will also be a valuable source of information to promote awareness and the adoption of protective measures (Rutgers et al. 2016).

The importance of compiling soil fauna biodiversity data
The systematic and permanent evaluation of the components that make up the state of the soil resource is an activity that requires indices and indicators that integrate and standardise complex and multidimensional information.This information must also be applicable at different scales to allow the understanding of the effects of the use of the soil resource and avoid its deterioration (Alkorta et al. 2003).There are currently standardised physical and chemical analysis techniques that evaluate the instantaneous state of the soil, but they do not evaluate the dynamic processes that affect structural stability and nutrient cycling that depend on the presence of biological activity in soils.It is necessary to evaluate the impact that different land uses can have on the organisms that compose the soil biota and, in this way, to generate biological indices and indicators that synthesise and account for the phenomena that occur in the soil.
Soil arthropods present a vast number of particularities that define them as efficient indicators of the functioning of the edaphic ecosystem.Amongst them are their great diversity, their ability to occupy microhabitats, their requirements for specific niches and their contribution to ecological cycles.In addition, they are highly sensitive to changes in environmental conditions and disturbances.They have a wide response capacity related to characteristics such as: body size, growth rates, dispersal capacity, adaptations to microclimatic conditions, their short reproductive cycles and their importance in food chains, in the degradation of the organic matter and flow of nutrients and energy in the system (Herrera andCuevas 2003, Sanabria 2020).
The invertebrates present in the soil biota are a primary link in the physical and chemical dynamics of the soil.They directly influence the formation of biogenic structures, the cycling of nutrients, the formation of aggregates and the decomposition of organic matter, soil porosity and water retention capacity (Sanabria 2020).In this work, mites (Arachnida, Acari), springtails (Entognatha, Collembola) and earthworms (Oligochaeta, Crasiclitellata) are considered.Both Acari and Collembola have characteristics that make them excellent biological indicators and this criterion is accompanied by a number quantity of bibliographies and recent studies that address the topic (Bedano 2007, Socarrás andIzquierdo 2014).In the case of Clitellata, in addition to being considered good bioindicators, they are considered ecosystem engineers due to the structural processes that their activity produces in the soil.
Today, it is possible to systematise and organise a large amount of information distributed in a wide variety of formats.Through technological advances that allow the management of large amounts of data, it is possible to relate them to multiple factors linked to the different systems of land use and their effects on the soil ecosystem.At the moment, there is no work for Argentina that collects all the available information on the biodiversity of the country's soil biota in a single place.The construction of such a georeferenced database on Acari, Collembola and Crassiclitellata constitutes the first step to knowing the biodiversity currently recognised for Argentina and has been recently compiled by Sanabria et al. (2023).

Study area description:
The sites where the relevant taxa were found are in the Neotropical Region, on the continent of South America, specifically in the Argentine Republic.Its extension is 13,761,274 km including the terrestrial areas, whose sovereignty is claimed by Argentina.The country has a wide surface coverage; therefore, it also has important climatic diversity, ranging from the tropical climates of the Chaco, Tucumán-Oranense and Misiones ecoregions, to the cold and dry climate of Patagonia.

Design description:
The database was built in two stages.In the first, bibliographic information on the taxa Acari, Collembola and Crassiclitellata was collected.The search for scientific works was carried out in different online search sites and physical documents of researchers from INEDES and libraries.In the second stage, the data were was integrated into the database respecting taxonomic levels and authorship of the initial researcher.The working database that compiles all the gathered information was designed following the best practices of relational database design to allow the efficient representation of data.It also enables querying the database in a flexible way.
Funding: This project has been funded by a Doctoral Scholarship to María Cynthia Valeria Sanabria from the Concejo Nacional de Investigaciones Científicas (CONICET-Argentina), through the research programme in Terrestrial Ecology of Universidad Nacional de Luján and with the support of the Instituto de Ecología y Desarrollo Sustentable (UNLu-CONICET).There was also logistical support from the GBIF Argentina node, which is in charge of standards control, review and hosting of data and metadata.

Description:
The study area covers the entire territory of the Argentine Republic.Bibliographic works with information on the taxa Acari, Collembolla and Clitellata were collected from different online repositories.The first recorded work is from 1902 and the last one is from 2023.

Sampling description: Database building:
2 A database was built containing all the information available for Argentina on Acari, Collembola and Crassiclitellata taxa.The building of the data base was carried out in two stages as described below.
Step description: Step one: Data collection.
A comprehensive search was performed on the taxa of Acari, Collembola and Crassiclitellata for works carried out in all of Argentina, from since as far back in time as possible (Suppl.material 1, Velazco ( 2023)).The works include theses from INEDES, online searches from Google Scholar, Scopus and JSTOR, personal requests to authors mentioned in the bibliography, Universidad de Buenos Aires library and Argentina's National Library.In each search engine, it was necessary to use several query variations to obtain get a higher document recall.
In Google Scholar, the initial search was for each taxon in Argentina; for example, "Acari Argentina".This search provided only few publications.Therefore, a search was implemented for each group in each province, such as "Acari Salta Argentina", both in Spanish and English.
JSTOR database was used for searching older publications.Additionally, if some work mentioned in a publication could not be found online or in libraries, the author was contacted directly to ask for the data.This was also the way unpublished works were obtained.
The occurrence records were georeferenced, based on the information provided by each original work.In this way, the occurrences were geographically located according to different strategies: a) if the work reported the exact coordinates, these were taken, b) if the publication referenced the sites in an image, then these were interpolated using a GIS tool and this approximation was taken as valid, c) if the works did not present exact information on the geographical coordinates, they were geolocated to the closest locations using Google Maps or Google Earth, d) Always, where possible, the georeferenced locations requested from the authors of the original work were used.
Additionally, the biological occurrence of the different taxa found, as well as their geographical locations, were recorded in this database and we also ensured that the ownership of each scientific work was preserved, adding the corresponding author to the registry.
Step two: Data integration.
The synonymy used by each researcher to identify the original work was preserved, including the key and nomenclature used by the authors during the development of their research.However, the taxonomy and nomenclature have changed over the years.That is why current systematic listings were chosen for each group according to the current taxonomic structure.
In order to unify the nomenclature for Acari, the systematic lists of Subias ( 2022 For Collembola, we followed the criteria put forth by Deharveng (2004), Zhang (2011) and Bernaba Lavorde and Palacio-Vargas (2020), where for works that used a different taxonomic level or a type of classification that fell into disuse, the data were was incorporated into the database at the higher taxonomic level.
For grouping the order Crassiclitellata, the information collected by Brown and Fragoso (2007) and by James and Davidson (2012) was used.Additionally, we also followed the considerations by Schmelz et al. ( 2021) that proposes updating in Oligochaeta (Annelida, Clitellata) to order.For all taxonomic groups, when a specimen was tagged with a question mark indicating an ID doubt (?) or with the abbreviations aff.or cf., it was registered at the next higher level (Lanteri 2000, Acosta 2007).For instance, the individual registered as Scheloribates aff.bidactylus, is counted as of the genus Scheloribates.

Geographic coverage
Description: This work covers all of Argentina's geography, as the collection of the information was performed as described above, for the entire territory of the country.
Table 1 shows the description of the three principal classes working in the database (Fig. 1).

Taxonomic coverage
Description: This dataset of organisms of the edaphic biota in Argentina covers different taxonomic levels of the Clitellata (Oligochaeta), Collembola and Acari classes.It shows the Orders, Infraorders, Superfamilies and Families widely recognised in the cited bibliography.
Table 2 presents the summary of numbers of the order of edaphic taxa found in Argentina.Site locations where specimens were collected in Argentina (Arachnida, Collembola and Clitellata).

Temporal coverage
Notes: The aim was to collect all the information available for the soil fauna of Argentina from since as far back as possible.The oldest cited work was published in 1902 and the newest was published in 2023.

Usage licence
Usage licence: Open Data Commons Attribution License IP rights notes: This work is licensed under a Creative Commons Attribution Non Commercial (CC-BY-NC 4.0) License.

Number of data sets: 1
Data set name: A georeferenced database of the edaphic biota currently available for Argentina Data format: Darwin Core Description: Soils have been studied and classified in terms of their physical and chemical characteristics, while the knowledge about biodiversity and the ecosystem processes that they support is lagging behind.Furthermore, the advance in scientific knowledge contributed by different researchers is dispersed and it is necessary to collect it to bring the big picture into focus.
Today, it is possible to have the findings and data collected by different researchers, compile them and, based on technological advances, have tools that allow the information to be analysed in its entirety.The main objective of this work is to compile and systematise all the bibliographic information available on the main organisms that make up biodiversity in the soil: Acari, Collembola and Crassiclitellata in Argentina.A second objective is to link the composition and structure of the soil community with processes and flows in the ecosystem and to estimate them at different scales and in soils with different anthropogenic impact.
The database presented here gathers presence information on the mentioned taxa, their geographical location for the entire country, while preserving the identity and authorship of each scientific work consulted.The taxonomic range of the organisms of the edaphic biota collected in this database ranges from class to subspecies and are registered, based on the taxonomic level reported by the original author in their research.
The publications were obtained from Google Scholar, Scopus and JSTOR.In addition, records were added from INEDES theses, library searches, information requested from authors cited in other articles and unpublished works.In total, information was collected from 224 published scientific works as well as personal information requested directly from some authors.The total number of registered individuals so far is 4838, of which 3049 specimens correspond to Acari, 944 to Classiclitellata and 845 belong to Collembola.
) andPachl  et al. (2020)  in the suborder Oribatida were used for the Sarcoptiformes.For the infraorder Astigmata, as well as for the orders Mesostigmata and Trombidiformes, the list proposed inKrantz and Walter 2009 and Krantz and Walter (2009), Zhang (2011) was used.

Table 1 .
Number of the principal classes of edaphic fauna registered in Argentina.

Table 2 .
Number of principal orders of edaphic fauna registred in the dataset.